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Abstract 



We present a nearly-linear time algorithm that produces high-quality spectral sparsifiers of 
weighted graphs. Given as input a weighted graph G = (V,E,w) and a parameter e > 0, we 
produce a weighted subgraph H = (V,E,w) of G such that \E\ = 0(n log n/e 2 ) and for all 
vectors x G M. v 



(1 - e) ^2 ( x ( u ) - x(v)) 2 w uv < ^ ( x ( u ) ~ x(v)) 2 w uv < (1 + e) ^ ( x ( u ) ~ x (v)) 2 w uv . (1) 




This improves upon the spectral sparsifiers constructed by Spielman and Teng, which had 
0(n log c n) edges for some large constant c, and upon the cut sparsifiers of Bencziir and Karger, 
which only satisfied (TT]) for x £ {0, 1} V . 

A key ingredient in our algorithm is a subroutine of independent interest: a nearly- linear 
time algorithm that builds a data structure from which we can query the approximate effective 
resistance between any two vertices in a graph in O(logn) time. 

1 Introduction 

The goal of sparsification is to approximate a given graph G by a sparse graph H on the same set 
of vertices. If H is close to G in some appropriate metric, then H can be used as a proxy for G in 
computations without introducing too much error. At the same time, since H has very few edges, 
computation with and storage of H should be cheaper. 

We study the notion of spectral sparsification introduced by Spielman and Teng [25]. Spectral 
sparsification was inspired by the notion of cut sparisification introduced by Bencziir and Karger [5] 
to accelerate cut algorithms whose running time depends on the number of edges. They gave a 
nearly-linear time procedure which takes a graph G on n vertices with m edges and a parameter 
e > 0, and outputs a weighted subgraph H with 0(nlogn/e 2 ) edges such that the weight of every 
cut in H is within a factor of (1 ± e) of its weight in G. This was used to turn Goldberg and 
Tarjan's 0{mn) max-flow algorithm [16] into an 0(n 2 ) algorithm for approximate st-mincut, and 
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appeared more recently as the first step of an 0(n 3 / 2 + m)-time 0(log 2 n) approximation algorithm 
for sparsest cut [15] . 

The cut-preserving guarantee of [5] is equivalent to satisfying (pQ) for all x £ {0, l} n , which are 
the characteristic vectors of cuts. Spielman and Teng \23\ [25] devised stronger sparsifiers which 
extend ([T]) to all x G R n , but have 0(nlog c n) edges for some large constant c. They used these 
sparsifiers to construct preconditioners for symmetric diagonally-dominant matrices, which led to 
the first nearly-linear time solvers for such systems of equations. 

In this work, we construct sparsifiers that achieve the same guarantee as Spielman and Teng's 
but with (3(n log n/e 2 ) edges, thus improving on both [5 J and [23J. Our sparsifiers are subgraphs of 
the original graph and can be computed in 0{m) time by random sampling, where the sampling 
probabilities are given by the effective resistances of the edges. While this is conceptually much 
simpler than the recursive partitioning approach of [23], we need to solve O(logn) linear systems to 
compute the effective resistances quickly, and we do this using Spielman and Teng's linear equation 
solver. 

1.1 Our Results 

Our main idea is to include each edge of G in the sparsifier H with probability proportional to 
its effective resistance. The effective resistance of an edge is known to be equal to the probability 
that the edge appears in a random spanning tree of G (see, e.g., [9] or [6]), and was proven in 
[7] to be proportional to the commute time between the endpoints of the edge. We show how to 
approximate the effective resistances of edges in G quickly and prove that sampling according to 
these approximate values yields a good sparsifier. 

To define effective resistance, identify G = (V, E, w) with an electrical network on n nodes in 
which each edge e corresponds to a link of conductance w e (i.e., a resistor of resistance l/w e ). Then 
the effective resistance R e across an edge e is the potential difference induced across it when a unit 
current is injected at one end of e and extracted at the other end of e. Our algorithm can now be 
stated as follows. 

H = Sparsify(G, q) 

Choose a random edge e of G with probability p e proportional to w e R e , and add e to H 
with weight w e /qp e . Take q samples independently with replacement, summing weights if 
an edge is chosen more than once. 

Recall that the Laplacian of a weighted graph is given by L = D — A where A is the weighted 
adjacency matrix (ciij) = Wij and D is the diagonal matrix (da) = Y^j^=i w ij °f weighted degrees. 
Notice that the quadratic form associated with L is just x T Lx = ^2 uv£ e( x ( u ) ~ x{v)) 2 w uv . Let L 
be the Laplacian of G and let L be the Laplacian of H. Our main theorem is that if q is sufficiently 
large, then the quadratic forms of L and L are close. 

Theorem 1. Suppose G and H = Sparsify(G, q) have Laplacians L and L respectively, and 
l/\/n < e < 1. If q = 9C 2 nlogn/e 2 , where C is the constant in Lemma\^ and if n is sufficiently 
large, then with probability at least 1 /2 

Vx 6 1" (1 - e)x T Lx < x T Lx < (1 + e)x T Lx. (2) 
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Sparsifiers that satisfy this condition preserve many properties of the graph. The Courant- 
Fischer Theorem tells us that 

. . x T Lx 

\ = max mm — = — . 

S:dim(S)=fc xeS X 1 X 

Thus, if Ai, . . . , A n are the eigenvalues of L and Ai, . . . , A n are the eigenvalues of L, then we have 

(l-e)Ai < Aj < (l + e)Ai, 

and the eigenspaces spanned by corresponding eigenvalues are related. As the eigenvalues of the 
normalized Laplacian are given by 

x T D- 1 / 2 LD- 1 l 2 x 
Xi = max min ^ , 

S:dim(5)=fc x&S X 1 X 

and are the same as the eigenvalues of the walk matrix D~ l L, we obtain the same relationship 
between the eigenvalues of the walk matrix of the original graph and its sparsifier. Many properties 
of graphs and random walks are known to be revealed by their spectra (see for example (6JIE145J). 
The existence of sparse subgraphs which retain these properties is interesting its own right; indeed, 
expander graphs can be viewed as constant degree sparsifiers for the complete graph. 
We remark that the condition ([2]) also implies 

\fx G R n -^—x T L + x < x T L + x < —^—x T L + x, 
1 + e ~ ~ 1 - e 

where L + is the pseudoinverse of L. Thus sparsifiers also approximately preserve the effective 
resistances between vertices, since for vertices u and v , the effective resistance between them is 
given by the formula {\ u — Xv) T L + (x u — Xv), where Xu is the elementary unit vector with a 
coordinate 1 in position u. 

We prove Theorem [1] in Section 3. At the end of Section 3, we prove that the spectral guarantee 
(|2|) of Theorem Q] is not harmed too much if use approximate effective resistances for sampling 
instead of exact ones(Corollary [6j). 

In Section 4, we show how to compute approximate effective resistances in nearly-linear time, 
which is essentially optimal. The tools we use to do this are Spielman and Teng's nearly-linear time 
solver |23[ [23] and the Johnson-Lindenstrauss Lemma [181 Q] . Specifically, we prove the following 
theorem, in which R uv denotes the effective resistance between vertices u and v. 

Theorem 2. There is an 0(m(logr)/e 2 ) time algorithm which on input e > and G = (V,E,w) 
with r = Wmax/wmin computes a (24 log n/e 2 ) x n matrix Z such that with probability at least 1 — l/n 

(1 - e)R uv < \\Z( Xu ~ Xv)\\ 2 < (1 + e)Ruv 
for every pair of vertices u,v £ V . 

Since Z(xu — Xv) is simply the difference of the corresponding two columns of Z, we can query 
the approximate effective resistance between any pair of vertices (u,v) in time 0(logn/e 2 ), and 
for all the edges in time 0(m log ra/e 2 ). By Corollary El this yields an 0(m(logr)/e 2 ) time for 
sparsifying graphs, as advertised. 

In Section 5, we show that H can be made close to G in some additional ways which make it 
more useful for preconditioning systems of linear equations. 
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1.2 Related Work 



Batson, Spielman, and Srivastava [4] have given a deterministic algorithm that constructs sparsifiers 
of size 0(n/e 2 ) in 0(mn 3 /e 2 ) time. While this is too slow to be useful in applications, it is optimal 
in terms of the tradeoff between sparsity and quality of approximation and can be viewed as 
generalizing expander graphs. Their construction parallels ours in that it reduces the task of 
spectral sparsification to approximating the matrix II defined in Section 3; however, their method 
for selecting edges is iterative and more delicate than the random sampling described in this paper. 

In addition to the graph sparsifiers of 0111 [23], there is a large body of work on sparse [31 [2] and 
low-rank [2j [22| \TU[ [TT] approximations for general matrices. The algorithms in this literature 
provide guarantees of the form \\A — A\\2 < e, where A is the original matrix and A is obtained by 
entrywise or columnwise sampling of A. This is analogous to satisfying (fT]) only for vectors x in 
the span of the dominant eigenvectors of A; thus, if we were to use these sparsifiers on graphs, they 
would only preserve the large cuts. Interestingly, our proof uses some of the same machinery as 
the low-rank approximation result of Rudelson and Vershynin [22] — the sampling of edges in our 
algorithm corresponds to picking q = 0(n log n) columns at random from a certain rank (n — 1) 
matrix of dimension mxm (this is the matrix II introduced in Section 3). 

The use of effective resistance as a distance in graphs has recently gained attention as it is often 
more useful than the ordinary geodesic distance in a graph. For example, in small-world graphs, 
all vertices will be close to one another, but those with a smaller effective resistance distance are 
connected by more short paths. See, for instance |13[ 112] , which use effective resistance/commute 
time as a distance measure in social network graphs. 



2 Preliminaries 

2.1 The Incidence Matrix and the Laplacian 

Let G = (V, E, w) be a connected weighted undirected graph with n vertices and m edges and edge 
weights w e > 0. If we orient the edges of G arbitrarily, we can write its Laplacian as L = B T WB, 
where B mxn is the signed edge-vertex incidence matrix, given by 

1 if v is e's head 
B(e,v) = { —1 if v is e's tail 
otherwise 

and W mxm is the diagonal matrix with W(e,e) = w e . Denote the row vectors of B by {b e } ee E 
and the span of its columns by IB = im(B) C M. m (also called the cut space of G [IS]). Note that 

bT { U ,v) = bcv -xu)- 

It is immediate that L is positive semidefinite since 

x T Lx = x T B T WBx = \\W l/2 Bx\\l > for every x G R n . 
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We also have ker(L) = ker(W 1//2 -B) = span(l), since 

x T Lx = <^=> ||W 1/a Ba;||| = 

^ w ra (x(u) - x(v)f = 

<^=^> x(u) — x(v) = for all edges (u,v) 
<^=^ x is constant, since G is connected. 

2.2 The Pseudoinverse 

Since L is symmetric we can diagonalize it and write 

n-1 

i=l 

where Ai, ... , X n -i are the nonzero eigenvalues of L and ui, ... ,u n -i are a corresponding set of 
orthonormal eigenvectors. The Moore-Penrose Pseudoinverse of L is then defined as 

n— 1 ^ 
L+ = — Uittf . 
i=i Ai 

Notice that ker(L) = ker(L + ) and that 

n-1 

LL + = L + L = u i u J \ 

i=l 

which is simply the projection onto the span of the nonzero eigenvectors of L (which are also the 
eigenvectors of L + ). Thus, LL + = L + L is the identity on im(L) = ker(L)^ = span(l)^. We will 
rely on this fact heavily in the proof of Theorem [TJ 

2.3 Electrical Flows 

Begin by arbitrarily orienting the edges of G as in Section 2.1. We will use the same notation as 
|17j to describe electrical flows on graphs: for a vector i ex t(it) of currents injected at the vertices, let 
i(e) be the currents induced in the edges (in the direction of orientation) and v(u) the potentials 
induced at the vertices. By Kirchoff's current law, the sum of the currents entering a vertex is 
equal to the amount injected at the vertex: 

B T i = i ex t- 

By Ohm's law, the current flow in an edge is equal to the potential difference across its ends times 
its conductance: 

i = WBv. 

Combining these two facts, we obtain 

i cxt = B T (WBv) = Lv. 
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If i ex t -L span(l) = ker(L) — i.e., if the total amount of current injected is equal to the total 
amount extracted — then we can write 



v = L + i cxt 

by the definition of L + in Section 12.21 

Recall that the effective resistance between two vertices u and v is defined as the potential 
difference induced between them when a unit current is injected at one and extracted at the other. 
We will derive an algebraic expression for the effective resistance in terms of L + . To inject and 
extract a unit current across the endpoints of an edge e = (u, v), we set i ex t = bj = (Xv — Xu), which 
is clearly orthogonal to 1. The potentials induced by i ex t at the vertices are given by v = L + bJ; to 
measure the potential difference across e = (u,v), we simply multiply by b e on the left: 

v(v) - v(u) = (xv ~ Xufv = b e L + b^. 

It follows that the effective resistance across e is given by b e L + b^ and that the matrix BL + B T has 
as its diagonal entries BL + B T (e,e) = R e . 

3 The Main Result 

We will prove Theorem[TJ Consider the matrix II = W l / 2 BL + B T W 1 / 2 . Since we know BL + B T (e, e) = 
R e , the diagonal entries of II are H"(e,e) = y/W(e,e)R ey /W(e,e) = w e R e . U has some notable 
properties. 

Lemma 3 (Projection Matrix), (i) U is a projection matrix, (ii) im(II) = \m.{W 1 l 2 B) = W 1 / 2 ^. 
(Hi) The eigenvalues of Ii are 1 with multiplicity n — 1 and with multiplicity m — n + 1. (iv) 
n(e,e) = ||n(.,e)|| 2 . 

Proof. To see (i), observe that 

n 2 = (W^BL+B^^iW^BL+B^ 2 ) 
= W 1,2 BL + {B T WB)L + B T W 1/2 
= W 1/2 BL + LL + B T W 1/2 since L = B T WB 
= W 1/2 BL + B T W 1/2 

since L + L is the identity on im(L + ) 

= n. 

For (ii), we have 

im(n) = im(W 1/2 BL + B T W 1/2 ) C im(W 1/2 B). 

To see the other inclusion, assume y G mi(W 1 / 2 B). Then we can choose x _L )zex(W 1 / 2 B) = ker(L) 
such that W 1 ' 2 Bx = y. But now 

Uy = W 1/2 BL + B T W 1/2 W 1/2 Bx 

= W 1/2 BL + Lx since B T WB = L 

= W l/2 Bx since L + Lx = x for x _L ker(L) 

= V- 



6 



Thus y £ im(n), as desired. 

For (iii), recall from Section 12.11 that dim(ker(VF 1 / 2 i3)) = 1. Consequently, dim(im(II)) = 
dim(im(W 1 / 2 5)) = n - 1. But since II 2 = II, the eigenvalues of II are all or 1, and as II projects 
onto a space of dimension n — 1, it must have exactly n — 1 nonzero eigenvalues. 

(iv) follows from II 2 (e, e) = IT(-, e) T IL(-, e), since II is symmetric. □ 

To show that H = (V, E, w) is a good sparsifier for G, we need to show that the quadratic 
forms x T Lx and x T Lx are close. We start by reducing the problem of preserving x T Lx to that 
of preserving y T Uy. This will be much nicer since the eigenvalues of II are all or 1, so that any 
matrix II which approximates II in the spectral norm (i.e., makes ||IT — IT 1 1 2 small) also preserves 
its quadratic form. 

We may describe the outcome of H = Sparsify (G, q) by the following random matrix: 

w e (# of times e is sampled) . , 

S(e, e) = — = — -. (3) 

w e qPe 

Smxm is a nonnegative diagonal matrix and the random entry S(e, e) specifies the 'amount' of edge 
e included in H by Sparsify. For example S(e, e) = l/qp e if e is sampled once, 2/qp e if it is sampled 
twice, and zero if it is not sampled at all. The weight of e in H is now given by w e = S(e, e)w e , 
and we can write the Laplacian of H as: 

L = B T WB = B T W 1/2 SW 1/2 B 

since W = WS = W 1 / 2 SW 1 ^ 2 . The scaling of weights by l/qp e in Sparsify implies that Ew e = w e 
(since q independent samples are taken, each with probability p e ), and thus ES 1 = I and EL = L. 

We can now prove the following lemma, which says that if S does not distort y T Hy too much 
then x T Lx and x T Lx are close. 

Lemma 4. Suppose S is a nonnegative diagonal matrix such that 

||ilsti — nn|| 2 < e . 

Then 

\/x eR n (1 - e)x T Lx < x T Lx < (1 + e)x T Lx, 
where L = B T WB and L = B T W 1 ' 2 SW 1 / 2 B . 

Proof. The assumption is equivalent to 

\y T Ii{S - I)Uy\ 
sup Tp < e 

since ||-A||2 = sup y _^ \y T M)\lv T V f° r symmetric A. Restricting our attention to vectors in \m.{W x l 2 B\ 
we have 

\y T U(S - I)Ily\ ^ 
sup 7p < e. 

y&ra(W 1 / 2 B),y^0 V V 
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But by Lemma [3j (ii) , II is the identity on \m(W l / 2 B) so LTy = y for all y G \m(W 1 /' :i B). Also, 
every such y can be written as y = W 1 ^ 2 Bx for x G M n . Substituting this into the above expression 
we obtain: 

\y T U(S-I)Uy\ 
sup y 

yeim(W 1 / 2 B),y^0 V V 

\y T (S-i)y\ 



sup 

y£im(W 1 / 2 B),y^0 V V 



T B T W 1 l 2 SW 1 ' 2 Bx - x T B T WBx\ 



x&»,W$*Bx?o x T B T WBx 

\x T Lx — x T Lx\ 
sup ZTrTZ — e - 

x&R n ,W 1 / 2 Bx^0 



x T Lx 



Rearranging yields the desired conclusion for all x ^ ker(W 1 / 2 B). When x G ker(W 1 ^ 2 B) then 
x T Lx = x T Lx = and the claim holds trivially. □ 

To show that ||HSTI — niT||2 is likely to be small we use the following concentration result, 
which is a sort of law of large numbers for symmetric rank 1 matrices. It was first proven by 
Rudelson in [21j . but the version we state here appears in the more recent paper [22J by Rudelson 
and Vershynin. 

Lemma 5 (Rudelson & Vershynin, [22] Thm. 3.1). Let p be a probability distribution over ft C M. d 
such that sup^gQ \\y\\2 < M and \\K p yy T \\2 < 1. Let y\-..y q be independent samples drawn from 
p. Then 

1 q 

' 2 



< min CM J*** ,1 



where C is an absolute constant. 

We can now finish the proof of Theorem [TJ 

Proof of TheoremUl Sparsify samples edges from G independently with replacement, with prob- 
abilities p e proportional to w e R e . Since ^2 e w e R e = Tr(II) = n — 1 by Lemma [3l (iii) , the actual 



probability distribution over E is given by p e = We ^ . Sampling q edges from G corresponds to 
sampling q columns from II, so we can write 

ILSn = S(e, e)U{; e)U{-, ef 

e 

E(# of times e is sampled) NTT , . T 
^ ^^n(-,e)n(-,e) T by© 
qpe 

= iV(#oftimes e i S sampled)^^ 

I q 



8=1 
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for vectors y±, . . . ,y q drawn independently with replacement from the distribution 



y = — ==n(-, e) with probability p e . 
We can now apply Lemma [5j The expectation of yy T is given by 

%y T = y> e -n(-, e )n(-, e ) T = nn = n, 

so ||Eyy T ||2 = 1 1 H 1 1 2 = 1. We also have a bound on the norm of y: 



1 



'Pe 



\U(-,e) 



Vn(e,e) 



n — 1 



V R eW e = Vn - 1. 



Taking g = 9C 2 nlogn/e 2 gives: 

E||nsTi-nn|| 



E 



1 ^ — > rp rp 

-l^ViVi -^yy 



<CJe 



,log(9C 2 n log n/e 2 )(n — 1) 



9C 2 n log 



n 



<e/2, 



for n sufficiently large, as e is assumed to be at least 1/^/n. 
By Markov's inequality, we have 

||nsn-n|| 2 < e 

with probability at least 1/2. By Lemma HI this completes the proof of the theorem. 



□ 



We now show that using approximate resistances for sampling does not damage the sparsifier 
very much. 

Corollary 6. Suppose Z e are numbers satisfying Z e > R e /a and ^2 e w e Z e < a^2 e w e R e for some 
a > 1. If we sample as in Sparsify but take each edge with probability p' e = -^^"z ^ ns ^ ea d °f 
p e = > then H satisfies: 

(1 - ea)x T Lx < x T Lx < (1 + ea)x T Lx Vx G R n , 
with probability at least 1/2. 



Proof. We note that 



P'e 



w e S e w e (R e /a) _ p e 



Y, e w e S e a w e R e a 2 
and proceed as in the proof of Theorem [TJ The norm of the random vector y is now bounded by: 



7=||n(e, OH2 < -^V n (e>e) = ay/n-1 
'Pe VP^ 



which introduces a factor of a into the final bound on the expectation, but changes nothing else. □ 
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4 Computing Approximate Resistances Quickly 

It is not clear how to compute all the effective resistances {R e } exactly and efficiently. In this 
section, we show that one can compute constant factor approximations to all the R e in time 
0(m log r). In fact, we do something stronger: we build a O(logn) x n matrix Z from which the 
effective resistance between any two vertices (including vertices not connected by an edge) can be 
computed in O(logn) time. 

Proof of Theorem [2 If u and v are vertices in G, then the effective resistance between u and v can 
be written as: 

Ruv = (Xu ~ Xv) T L + (xu ~ Xv) 

= (Xu ~ Xv) T L + LL + (xu ~ Xv) 

= ((Xu ~ XvfL+BTw^iW^BL+ixu ~ Xv)) 

= \\WV 2 BL+( XU - Xv m 

Thus effective resistances are just pairwise distances between vectors in {W 1 ' 2 BL + x v }veV ■ By 
the Johnson-Lindenstrauss Lemma, these distances are preserved if we project the vectors onto a 
subspace spanned by O(logn) random vectors. For concreteness, we use the following version of 
the Johnson-Lindenstrauss Lemma due to Achlioptas pQ. 

Lemma 7. Given fixed vectors v \ . . . v n E M. d and e > 0, let Qkxd be cl random ±l/vfc matrix (i.e., 
independent Bernoulli entries) with k > 241ogn/e 2 . Then with probability at least 1 — 1/n 

(1 - e)\\vi - Vj\\l < \\Qvi - Qvj\\l < (1 + e)\\vi - Vj\\l 

for all pairs i,j < n. 

Our goal is now to compute the projections {QW 1 / 2 B L + Xv} ■ We will exploit the linear system 
solver of Spielman and Teng [23, 24j, which we recall satisfies: 

Theorem 8 (Spielman- Teng). There is an algorithm x = STSolve(L, y, 5) which takes a Laplacian 
matrix L, a column vector y, and an error parameter 5 > 0, and returns a column vector x satisfying 

\\x - L + y\\ L < e||L + y|| L , 

where \\y\\ L = \J y T Ly. The algorithm runs in expected time O (m log(l/5)), where m is the number 
of non-zero entries in L. 

Let Z = QW 1 / 2 BL + . We will compute an approximation Z by using STSolve to approximately 
compute the rows of Z. Let the column vectors Z{ and z~i denote the ith rows of Z and Z, respectively 
(so that Zi is the ith column of Z T ). Now we can construct the matrix Z in the following three 
steps. 

1. Let Q be a random ±l/\/fc matrix of dimension k x n where k = 241ogn/e 2 . 

2. Compute Y = QW X I 2 B. Note that this takes 2m x 241ogn/e 2 + m = 0(m/e 2 ) time since B 
has 2m entries and W 1 / 2 is diagonal. 

3. Let yi, for 1 < i < k, denote the rows of Y, and compute Z{ = STSolve(L, yi, 5) for each i. 
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We now prove that, for our purposes, it suffices to call STSolve with 



6 = 



e / 2(1 — e)w r , 



31/(1 + e)n 3 w r , 



Lemma 9. Suppose 



(1 - e)R uv < \\Z( Xu ~ Xv)\\ 2 < (1 + e)R uv , 
for every pair u, v G V. If for all i, 

\\zi - Zi\\ L < S\\zi\\ L , 

where 

s < £ / 2(1 - e)w min 



then 



3 y (1 + e)n 3 w max 
(1 - e) 2 R uv < \\Z( Xu ~ Xv)f < (1 + efRuv, 



for every uv. 

Proof. Consider an arbitrary pair of vertices u, v. It suffices to show that 



\Z(Xu ~ Xv)\\ - \\Z(xu - Xv)\\ < g \\Z(xu - Xv)\\ 



since this will imply 
\\Z(Xu-Xv)\\ 2 -\\Z(Xu-Xv)\\ 2 



(4) 
(5) 



(6) 



Z(Xu-Xv)\\-\\Z(Xu-Xv)\\ ■ \\Z(Xu-Xv)\\ + \\Z{Xu-Xv) 
<|-(2 + |) \\Z( Xu - Xv )\\ 2 . 



As G is connected, there is a simple path P connecting u to v. Applying the triangle inequality 
twice, we obtain 

\Z{Xu~Xv)\\- Z(Xu-Xv) \<^(Z-Z)(Xu-Xv) 

< Y, \\(Z-Z)( X a-Xb) 
abeP 
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We will upper bound this later term by considering its square: 

2 

^ ~ " ' Xb 



( £ \\(Z-Z)( Xa - Xb ] 

\abeP 



<n ^ \\(Z-Z)( Xo 

ab&P 

<n ]T \\(Z-Z)(xo 



by Cauchy-Schwarz 



Xb, 



abeE 



n\\(Z - Z)B T 
B(Z - Z) T 



n 



writing this as a Frobenius norm 



< 



w. 



<6 2 



n 



W l/2 B{Z - Z) T 

2 

W 1,2 BZ T ' 



since 



|vr- 1/2 || 2 < W 



since \\W 1/2 B\ 



< 5 2 \\W 1/2 Bzi\\ 2 by g]) 



52 Wab W Z (*- a ~ xt>W 



ab&E 



< S 
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< S 



^ w ab {l + e)R ab 

ab&E 

2 n(l + e) 



On the other hand, 



\Z(xu-XvW > 0--e)Ruv > 



(n — 1) by Lemma El (hi) . 
2(1 - 6) 



nw r , 



by Proposition [TUl Combining these bounds, we have 



\Z{Xu-Xv)\\- Z(xu~Xv) 



\Z(Xu - Xv) 



< 5 



e 

< - 
~ 3 



U! m in 

by ©, 



nw r , 



2(1 - e) 



1/2 



as desired. 



□ 



Proposition 10. If G = (V, E, w) is a connected graph, then for all it, v £ V , 

r> > 2 



nw r 



Proof. By Rayleigh's monotonicity law (see [6]), each resistance R uv in G is at least the correspond- 
ing resistance R' uv in G' = w max x K n (the complete graph with all edge weights w max ) since G' is 
obtained by increasing weights (i.e., conductances) of edges in G. But by symmetry each resistance 
R' uv in G' is exactly 

E«» ( n - _ 2 



n(n — 1) /2 nuv. 



Thus i?„„ > 



for all u.t; 6 V. 



□ 
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Thus the construction of Z takes 0(m log(l/<5)/e 2 ) = 0(mlogr/e 2 ) time. We can then find 
the approximate resistance \\Z(x u — Xv)\\ 2 ~ Ruv for any u,v 6 V in 0(logn/e 2 ) time simply by 
subtracting two columns of Z and computing the norm of their difference. □ 

Using the above procedure, we can compute arbitrarily good approximations to the effective 
resistances {R e } which we need for sampling in nearly-linear time. By Corollary El any constant 
factor approximation yields a sparsifier, so we are done. 



5 An Additional Property 

Corollary [6] suggests that Sparsify is quite robust with respect to changes in the sampling prob- 
abilities p e , and that we may be able to prove additional guarantees on H by tweaking them. In 
this section, we prove one such claim. 

The following property is desirable for using H to solve linear systems (specifically, for the 
construction of ultrasparsifiers \23\ 124] , which we will not define here) : 



For every vertex v € V. — - < 2deg(v). (7) 

This says, roughly, that not too many of the edges incident to any given vertex get blown up too 
much by sampling and rescaling. We show how to incorporate this property into our sparsifiers. 

Lemma 11. Suppose we sample q > An\ogn/(3 edges of G as in Sparsify with probabilities that 
satisfy 

P{u,v) — n mm(deg(u), deg(u)) 
for some constant < (3 < 1. Then with probability at least 1 — 1/n, 



^2 — < 2 deg(w) for all v e V. 



e3v 



Proof. For a vertex v, define i.i.d. random variables X\, . . . , X q by: 

f ^- if e 3 v is the ith edge chosen 
{ otherwise 

so that Xi is set to l/p e with probability p e for each edge e attached to v. Let 

x - w e x - (# of times e is sampled) 1 

D v = y — = y = — y Xi. 

eBv e eBv ^■ re ^ i=l 

We want to show that with high probability, D v < 2deg(f) for all vertices v. We begin by bounding 
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the expectation and variance of each Xf. 

EXi = J2Pe— = deg(i 



Pe 



e3v 

< £ byassumption 



< 



romin(deg(u), deg(t>)) 

n deg(-u) 
P 

(u,v)Bv 

ndeg(v) 2 
= 

Since the X{ are independent, the variance of D v is just 



1 q 

Var(D v ) = -^Var(^) < 



n deg(u) z 



We now apply Bennett's inequality for sums of i.i.d. variables (sec, e.g., [20 ), which says 

(ED,) 2 \ 



\D V -ED V \ > ED V ] < exp 



I Var(D„)(l + S^) 



We know that EI?, = EXj = deg(u). Substituting our estimate for ~Var(D v ) and setting q > 
4nlogn/ft gives: 



F[D V > 2deg(t;)] < exp 



deg(u) 



n.dcg(f) 2 ,-. deg(n) 
Pq V 1 + ? 



< exp since 1 H < 2 

V 2n y g 

< exp (—2 log n) = 1/n 2 . 

Taking a union bound over all v gives the desired result. □ 
Sampling with probabilities 

1/ II^IIV , 1 



+ 



2 \\Zb^\\ 2 w e n min(deg(u), deg(u)) 



satisfies the requirements of both Corollary [6] (with a = 2) and Lemma QT] (with ft = 1/2) and 
yields a sparsifier with the desired property. 
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Theorem 12. There is an 0(m/e 2 ) time algorithm which on input G = (V,E,w),e > produces 
a weighted subgraph H = (V,E,w) of G with 0(nlogn/e 2 ) edges which, with probability at least 
1/2, satisfies both (0j and (?p. 
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